Using Uncertainty Intervals to Analyze Confidentiality Rules for Magnitude Data in Tables

نویسنده

  • Paul B. Massell
چکیده

Protecting the confidentiality of survey respondent data is related to the notion of data user uncertainty in various ways. The source of uncertainty that is most frequently exploited by agencies in formulating protection rules for tabular data is the fact that there is often more than one respondent (e.g., a company) contributing to a given table cell value. Agencies are required to protect these individual contributions. The uncertainty in a data user’s mind about how the published cell value is distributed among the contributions is often sufficient to protect them. This “cell value distributional uncertainty” may be the most exploited source of uncertainty, but it is by no means the only one. Data user uncertainty about respondent contributions is created through many of the procedures involved in the design of a survey and in processing the collected data. It is usually possible to express a given data user’s uncertainty about a particular respondent’s contribution to a particular cell as a finite interval. The interval may be derived from inequalities associated with the table’s additivity or it may be based on “knowledge models” that describe, for example, the data user’s prior (approximate) knowledge of respondent contributions or sampling weights. We call such intervals “uncertainty intervals”. Sometimes the knowledge models may allow a probability distribution to be defined on the uncertainty interval. The major thesis of this paper is that uncertainty intervals can be used as a means of unifying the description of many of these sources of uncertainty. We show how uncertainty intervals can unify the description of several formulas and algorithms that are frequently used during the process of protecting data, e.g., those related to the p% rule, sliding and two-sided protection, cell value rounding, and weights applied to the underlying microdata. In future work, the author hopes to extend this approach to additional sources of uncertainty.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Nonparametric Disclosure Risk Estimation via Mixed Effects Log-linear Models

Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk...

متن کامل

WP. 30 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT)

We extend the safety rules used for the Statistical Disclosure Control of magnitude tables to include an intruder who models the ignorance about an unknown confidential quantity with a Uniform distribution. By applying this extension to the generalised p-rule we obtain the safety rules useful also in the presence of groups of respondents. The corresponding disclosure rules for different prior k...

متن کامل

Analysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam

Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...

متن کامل

Confidence Intervals for OD Demand Estimation

Representative origin-destination (OD) demand tables are a crucial part of making many transportation models relevant to practice. However estimating these OD tables is a challenging problem, even more so determining the confidence intervals on these OD estimates. In this work we propose a method to construct estimates and confidence intervals of OD demand tables from link flow data. Our method...

متن کامل

A Hybrid Model of Heart Anomalies Detection by Processing Heart Sounds

​Introduction: Different factors are effective in detecting heart abnormalities. The greater the number of these factors, the greater the uncertainty in the detection of heart abnormalities. In the uncertainty condition in response of prediction model, the fuzzy systems are one of the most effective methods for generating an acceptable response. Method: In this applied study, 3240 records rela...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006